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ABSTRACT 

We demonstrate the utility of dendrograms at representing the essential features of the hierarchical structure 
of the isosurfaces for molecular line data cubes. The dendrogram of a data cube is an abstraction of the changing 
topology of the isosurfaces as a function of contour level. The ability to track hierarchical structure over a range 
of scales makes this analysis philosophically different from local segmentation algorithms like CLUMPFIND. 
Points in the dendrogram structure correspond to specific volumes in data cubes defined by their bounding 
isosurfaces. We further refine the technique by measuring the properties associated with each isosurface in the 
analysis allowing for a multiscale calculation of molecular gas properties. Using COMPLETE '^CO (7=1^0) 
data from the L1448 region in Perseus and mock observations of a simulated data cube, we identify regions 
that have a significant contribution by self-gravity to their energetics on a range of scales. We find evidence for 
self-gravitation on all spatial scales in L1448 though not in all regions. In the simulated observations, nearly 
all of the emission is found in objects that would be self-gravitating if gravity were included in the simulation. 
We reconstruct the size-line width relationship within the data cube using the dendrogram-derived properties 
and find it follows the standard relation: a^. oc Finally, we show that constructing the dendrogram of CO 
(7 = 1 — > 0) emission from the Orion-Monoceros region allows for the identification of giant molecular clouds 
in a blended molecular line data set using only a physically motivated definition (self-gravitating clouds with 
masses > 5 x 10"* M©). 

Subject headings: ISM:clouds — ISM: structure — methods: analytical — techniques: image processing 



1. INTRODUCTION 

The structure in molecular clouds determines, in part, the 
locations, numbers and masses of newly formed stars. Be- 
cause of its important role at establishing the initial mass func- 
tion of stars as well as the local star formation rate, great ef- 
fort has been invested in characterizing the structure of this 
gas. Observations of molecular clouds including molecu- 
lar and atomic line surveys, extinction and infrared emission 
mapping, and star counts have all been used to characterize 
the nature of molecular clouds. A myriad of analytic tech- 
niques have been applied to these data with a broad range 
of results. Each technique is designed to highlight a differ- 
ent feature of the gas: fractal analysis techniques are use d 
to demonstrate that the gas is fractal (IStutzkiet al.lll998h : 
searches fo r clumps utilize clump identification algorithms 
jStutzki & Glisten 1990); studies characterizing turbulence 
frequently aim to measure t heoretically relevant quan tities 
such as the power spe ctrum (iLazarian & Pogosvanll200Ql) or 
the structure function (iHeyer & Bruntil2004l) ! 

One of the dominant characteristics of molecular gas it 
that it is hierarchical. A preponderance of multi-tracer stud- 
ies have consistently shown that the high-density features in 
molecular clouds have relatively small physical scales and are 
inva riably contained ins i de envelop es of lower density gas 
(e.g. i Bhtz & Star5lI986l: lLadalll992i) . Moreover, the hierar- 
chy is non-trivial: for any given scale, there are more small- 
scale, dense structures than there are large-scale, sparse struc- 
tures. Dense cores are the top level of the cloud hierarchy 
and the turbulence that characterizes molecular clouds makes 
a transition to coherence (i.e. domination by thermal rather 
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than turbulent motions) on ^ 0.1 pc scales dGoodman et alj 
[19981 iTafalla et al.l [20041 iLada et all [2007^. These dense 
cores are the exclusive hosts of star formation inside molec- 
ular clouds and much effort has been expended to study the 
prope rties of these cores and the stars that form inside them 
(e.g., Idi Francesco et al.1 120071; IWard-Thompson et al.1 l2007l 
and references therein). Indeed, studies have argued for the 
close relationship between the dense cores and the newly- 
form ed stars based on the similarities of their mass functions 
(iMotte et alJ[l998 t iTesti & Sarge"n3lI998l lAlves et al.1 [20071 
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2003). 



The low-density gas that fills the majority of the volume 
of the molecular cloud can be regarded as the bottom of the 
gas hierarchy in the molecular clouds (though the filling frac- 
tion and chemical state of the molecular cloud is far from uni- 
form). The chemical change associated with the formation of 
star-forming (molecular) clouds has commonly been used to 
define discrete clouds in the interstellar medium (ISM) serv- 
ing as a useful division between the diffuse, multi-phase ISM 
and star-forming clouds. However, there is some debate about 
whether the boundaries of molecular clouds form a mean- 
ingful bottom of the hierarch y and are dis tinct entities (the 
"classical" interpretation. Blit z et ani2007h ; or whether the 
hierarchical structure continue s with only chemical changes 
into the diffuse ISM (e.g. Ballesteros-Paredes et"an 119991: 
iHartmann et al.|[2001l) . The crux of the debate centers around 
lifetimes of the molecular clouds relative to their internal 
crossing times or, equivalently, the importance of self-gravity 
in the cloud's energetics. However, much of this debate has 
centered on cons idering disparate sets of observations and 
lElmegreen[ (l2007l) has presented a synthesis that argues rel- 
atively long-lived (20-30 Myr) self-gravitating clouds can ac- 
commodate local, rapid star formation within them account- 
ing for the sets of observations that drive an apparent contra- 
diction in measurements of cloud lifetimes. 

Connecting molecular clouds to the atomic gas in and 
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around them is particularly difficult since the 21 -cm obser- 
vations that would be directly comparable to molecular line 
studies suffer from fore- and background confusion as well 
as an intrinsic degradation of spatial resolution from the long 
wavelength o f the emission. For some cases, where geometry 
(iPound & Goodman 199 7). self-absorption (iLi & Goldsmith 
20031) . or modeling of photodissociation regions dBensch 
20061) allows, the atomic gas related to molecular clouds can 
be studied. Studying the hierarchical structure within molec- 
ular clouds requires a large spatial dynamic range which re- 
stricts useful observational data sets to galactic objects. Al- 
though the hierarchical structure of the ISM continues to large 
scales in the galaxies, the above restrictions limit considera- 
tions of hierarchical structure within star forming clouds to 
those found in the gas traced by molecular emission. 

This paper presents another analytic technique aimed to 
characterize the hierarchical structure in molecular gas and 
relate it to the star formation process. We use dendro- 
grams to graphically represent hierarchical structure of nested 
isosurfaces in three-dimensional molecular line data cubes 
(i.e. position-position-velocity data cubes). The dendrograms 
are abstractions of how the isosurfaces nest inside one an- 
other Our principal contribution in this work is using stan- 
dard molecular line analysis techniques to characterize the 
branches in a dendrogram allowing for simultaneous mea- 
surement of various properties on a range of physical scales. 
In addition, dendrograms are a reduction of the structure in 
a data set to its essential features and, as such, they become 
useful reductions of large data sets to simple models allowing 
the study of a wide range of spatial scales. 

The dendrograms presented here are simply an alterna- 
tive apphcation o f the structure trees presented first in 
iHoulahan & Scalol ([T99l hereafter HS92). While novel 
at the time for the star formation community, such di agram 
techniques were relatively common in other disciplines dWestl 
HOOO). In the intervening time since the publication of HS92, 
the analysis of tree networks has become even more devel- 
oped and tools for the construction and analysis of the result- 
ing structure trees have become commonplace (e.g. we will 
apply software in the standard IDL distribution for the fol- 
lowing analysis). Our application of the dendrogram formal- 
ism uses a significantly different analytic approach compared 
to the work of HS92. They analyzed the characteristics of 
the structure trees derived from two-dimensional data. The 
present work uses dendrograms as an abstraction of the iso- 
surfaces present in three-dimensional data, emphasizing the 
properties of those isosurfaces. Finally, note that the appli- 
cation of dendrograms to contour surfaces as in this work and 
HS92 is significantly differ ent from their commo n application 
in statistical analysis (e.g. iGhazzali et al.l [T999h where they 
are used to represent clustering in statistical data sets. We re- 
fer to HS92's structure trees as dendrograms to be consistent 
with the nomenclature adopted in other fields, in particular 
that of the statistical description of hierarchical systems. 

This paper briefly discusses different approaches to molec- 
ular line data (© before developing the concept of dendro- 
grams (©. We discuss several refinements of the dendrogram 
technique including accounting for the effects of noise ( ^3. lb . 
measuring cloud properties on dendrogram branches (®, and 
the complications of mapping between observed and physical 
domains 04. lb . We conclude with two applications of the 
dendrogram technique: an analysis of self-gravity in L1448 
(© and the identification of GMCs in blended data sets (® . 



2. THE ANALYSIS OF MOLECULAR LINE DATA 

Broadly speaking, the statistical analysis of molecular line 
data has usually followed one of two paths. Either authors 
construct statistical descriptions of the emission from an en- 
tire molecular line data set, or authors will segment (divide) 
the data into what they believe to be physically relevant struc- 
tures and study the distribution of properties in the result- 
ing population of objects. Comm on examples of statisti- 
cal analysis include fractal analysis dElmegreen & FalgaronS 
1996t IStutzki et all ri998). A-variance (Stutzki et al.' 19981 
Bensch et al. "2001'), correlation functions (Houlahan & Scald 
1990; Rosolowsky et al. 1999; Lazarian & Pogosvan 2000]) 
and Principal Component Analysis (iHever & Brund |2004). 
Statistical analyses produce many interesting comparisons be- 
tween and among data, but the physical interpretation of the 
statistics can be complicated. The most useful applications 
of the statistical approach tend to be in comparative measure- 
ments between two observational data sets or between obser- 
vations and a simulation (e.g. Padoan et al. 2006). 

The segmentation and identification techniques are favored 
in the case where the emission is thought to be comprised of 
physically important substructures. In molecular line astron- 
omy, the classic examples of the segmentation approach is 
the generation of GMC catalogs for the inner galaxy where 
GMCs are identified as connected regions of e mission above 
a thr eshold intensity (ISolomon et al.l Il987t : IScoville et alj 
Il987h . Unfortunately, the results of this approach is con- 
trolled by the sensitivity and resolution of the data set"^. The 
two applications of the segmentation approach that have most 
shaped molecular line astronomy, particularly with regards to 
the field of star formation, are the clump identific ation algo- 
rithms of I Williams et al.l(ll994 and lStutzki & Gus ten ( 1990). 
The clumpy substructure of molecular clouds was first identi- 
fied by eye (Blitz & Stark 1986) and this structure is thought 
to be important at es tablishing the sites of star formation. 
IWilliams et al.l (1 19941) applied a watershed segmentation al- 
gorithm to molecular cloud data to identify "clumps" within 
the cloud (the now-famous CLUMPFIND algorithm). The 
CLUMPFIND algorithm has spawned many subsequent ap- 
phcations and its utility is discussed elsewhere dPineda et all 
lin preparation!) . Where CLUMPFIND is driven by the struc- 
ture in the data and precludes fin ding overlapping o bjects , 
the GAUSSCLUMP algorithm of Stut zki & Giistenl ([l990l) 
(later revisited by Kramer etal. 1998) iteratively fits three- 
dimensional Gaussians to data cube to identify structures in 
the data. Both algorithms have been used to define the mass 
spectrum of clumps within molecular clouds, usually finding 
a -1.5 to- 1.9 for dN/dM (xN°'. It should be noted that 
CLUMPFIND and GAUSSCLUMP are not intended to pro- 
duce the same partitioning of a data cube; they adopt sub- 
stantially different starting assumptions with a correspond- 
ing difference in the results. The results CLUMPFIND and, 
to a lesser extent, GAUSSCLUMP are influenced by their 
user-defined parameters and algorithmic design which are de- 
signed to mimic the "by eye" identification. 

3. THE DENDROGRAM TECHNIQUE 

The dendrogram technique presented here combines the ro- 
bustness of the statistical approach with the direct link to 
structure in the data explored in the segmentation and iden- 

* Sensitivity and resolution effects also contaminate analysi s using statis- 
tical m ethods, but it is possible to correct for these effects (e.g. IBensch et alj 
I2OOII) . 
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tification approach. The analysis of dendrograms presented 
in HS92 highlights the utility of the method at characterizing 
two-dimensional extinction maps using a few simple statis- 
tics. 

We begin by considering images in general, without a spe- 
cific astronomical data type in mind. This section emphasizes 
ideal data where the presence of noise does not interfere with 
structure identification. As discussed in HS92, a dendrogram 
is a graphical representation of the primitive structure within 
an image of arbitrary dimension. It is the skeleton of the ob- 
ject containing only information about the structures and sub- 
structures within contour diagram of the object. A schematic 
of the dendrogram technique is shown in Figure [T] for a one- 
dimensional emission profile. If the emission profile were 
thresholded^ at level /i a single connected region results. In 
contrast, if the profile is thresholded at I2 two distinct objects 
will result, corresponding to each of the local maxima. The 
level represents the critical boundary between these two 
regimes, below which two objects merge into a single object. 
The dendrogram is a scheme to track the structure as a func- 
tion of contour level in the profile and thus it represents the 
essential information about the structure of the object. The 
dendrogram also encodes where the composite object com- 
bines with the third distinct object. 

For two dimensional data, a common analogy is to think 
of the dendrogram technique as a descriptor of a submerged 
mountain chain. If the overlying water were drained away, 
first the peaks of mountains would appear as isolated objects. 
As successively more water is drained, the peaks would merge 
together into larger objects. The dendrogram encodes infor- 
mation about which objects merged together and at what con- 
tour levels they did so. To plot a dendrogram of this data we 
can flatten the two dimensional structure into one dimension 
but doing so eliminates any positional information in the tree. 

A useful formalism for interpreting dendrograms in three 
dimensions is to consider each point in the dendrogram as 
representing an isosurface (3D contour) in the data cube at a 
given level. If an arbitrary data set is thresholded at a fixed 
contour level, it breaks up into one or more distinct regions. 
The bounding surfaces of these volumes are the isosurfaces 
represented in the dendrograms, with each distinct surface 
corresponding to a point in the dendrogram. We identify the 
distinct surfaces by the set of local maxima that they contain. 
Over a range of contour intervals with no mergers, threshold- 
ing the data at slightly higher or lower level will produce the 
same essential features, namely the same number of distinct 
regions containing the same local maxima. Hence, the den- 
drogram will be comprised of vertical branches. The length 
of these branches corresponds to the range of contour levels 
over which a set of isosurfaces is unchanged (though the ac- 
tual volume will change). There are specific contour levels in 
the data above which a pair of volumes will be distinct and 
below which the two volumes are joined. We refer to these 
critical levels as the merge levels. Below the merge level, a 
single isosurface contains both sets of local maxima that de- 
fined the distinct surfaces above the merge level. To represent 
this change in the topology of the isosurfaces, we connect the 
two branches of the dendrogram at the merge level. 

A sample dendrogram is shown in Figure |2] (top) represent- 
ing the ' ^CO (1 0) emission from the L1448 dark cloud in 
Perseus (iRidge et al.l 2006). There is no spatial information 

^ Thresholding is the mapping of a real-valued image to a binary image 
with all data above the threshold set to 1 and all data below set to 0. 



encoded in the jc-axis of the plot but rather the ordering of 
the leaves is chosen so that the branches of the dendrogram 
do not cross. This choice facilitates visualizing the hierarchi- 
cal structure in the data at the expense of retaining the ge- 
ometrical relations between the leaves. The information on 
the spatial relationships between the objects is retained in this 
analysis and can be used to label maps with the regions of 
dendrograms they correspond to, though it is not shown in 
the dendrogram. The construction of dendrograms including 
a discussion of the effects of noise is discussed in more detail 
below. 

3.1. Determining the Leaves of a Dendrogram 

In the following sections, we specifically consider radio- 
line data cubes in position-position-velocity space (PPV) with 
intensities given in brightness temperatures (Kelvin). Such 
observational data are invariably contaminated by noise which 
interferes with the dendrogram process. The structure of the 
dendrogram is determined entirely by the local maxima in 
the data. A local maximum, by definition, has a small re- 
gion around it containing no data values larger than the local 
maximum and, hence, a distinct isosurface containing only 
that local maximum can be drawn. The local maxima deter- 
mine the top level of the dendrogram, which we refer to as the 
leaves, defined as the set of isosurfaces that contain a single 
local maximum. 

In noiseless data, every local maximum in the data would 
correspond to an actual emission feature in the data. Unfortu- 
nately, in real data, noise will mask the low-amplitude varia- 
tions in the emission structure resulting in spurious local max- 
ima that do not correspond to real structure in the data. In the 
dendrogram method, we suppress the effects of these noise 
fluctuations by rejecting local maxima that are likely caused 
by noise. 

We describe our algorithm here in more detail considering, 
without loss of generality, that we are examining only a sin- 
gle cloud of emission such that a low-lying contour will con- 
tain all the emission of interest in a cloud. The initial leaves 
of the dendrogram are selected by identifying all local max- 
ima and then rejecting maxima that are likely to be caused by 
noise. We generate a list of all local maxima by identifying 
all pixels in the image that have data values larger than all 
of their neighbors over a box D^ax x Ana.i x ^Vmax in PPV 
space where D,„ax and I^Vmax are free parameters. A non- 
trivial box size {D„jax and AV greater than one pixel/channel) 
reduces the numbers of candidates that must be checked for 
significance against our noise-suppression criteria. The al- 
gorithm becomes insensitive to structure in the data cube on 
scales less than a box size. If the box is too large, significant 
structures are suppressed. Since the rejection of local maxima 
only simplifies the dendrogram by considering a subset of the 
structurally defining features, reducing the size of the noise- 
suppression box can be used to check if an essential feature 
has been eliminated. 

After the initial generation of local maxima, the set is then 
decimated by removing local maxima that are likely to result 
from noise. For each pair of candidate maxima, we find the 
highest shared isosurface that contains both maxima. This 
isosurface is the merge level, a high-dimensional analog of 
the contour level at the saddle point shown in Figure [T] For 
the merge level, we calculate (a) the volume uniquely associ- 
ated with each maximum and (b) the difference in antenna 
temperature between the merge level and each local maxi- 
mum. We remove any local maximum for which the volume 
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Fig. 1. — Schematic diagram of tlie dendrogram process. Tlie left panel shows a one-dimensional emission profile with three distinct local maxima. The 
dendrogram of the region is shown in blue and repeated in the right panel where the components of the dendrogram are labeled. The left-hand panel indicates 
three characteristic contour levels thi'ough the data. Thi'esholding at /j produces a single object whereas thresholding at I2 produces two. The level separating 
these two regimes is indicated as Icrit . 




Fig. 2. — Dendrograms of CO emission in L1448. The top panel shows 
the dendrogram of L1448 using the standai'd algorithm parameters. The bot- 
tom panel shows the dendrogram after relaxing the conditions for noise sup- 
pression resulting in more independent leaves in the dendrogram (see the end 
of j|3.U . However, the basic structure of the dendrogram remains the same; 
the isosuifaces used in the top plot are a subset of those used in the bottom. 
Each leaf of the dendrogram is labeled in the top plot and the corresponding 
leaf is identified in the bottom figure. Leaves appearing in both dendrogram 
also have a circle at their tip in the bottom plot. 

of the isosurfaces that contain only that maximum is less than 
some minimum number of pixels (A',,,,-,,, usually taken to be 
4). Furthermore, we only recognize a significant bifurcation 
in structure when both local maxima are more than a given 
interval A7;,„u above the highest contour level that contains 
both of the maxima, i.e. the level at which the objects merge. 
Suc h a criterion has been used previously in data cube analy- 
sis dBrunt et al.ll2003l: iRosolowskv & Blitzll2005h : noise fluc- 
tuations will typically only produce maxima with characteris- 
tic height arms so variations significantly larger than this are 
nominally real. If this criterion is not fulfilled, we reject the 
lower of the two local maxima and consider the emission pro- 
file to represent only a single object. We note that the resulting 
dendrogram using a decimated set of local maxima represents 
a set of isosurfaces that are a subset of the isosurfaces that 
would be considered including all local maxima (see Figure 
0. 

Hence, the initial leaves of the dendrogram are determined 
by four free parameters: Dj^ix, AV,,,,,^-, 




Position 

Fig. 3. — Schematic diagram of the parameters that determine the deci- 
mation of local maxima. The same profile as in Figure [T]is used. The local 
maximum indicated with the white point would be considered a valid lo- 
cal maximum if (a) it is the highest point in a window D„,ax on either side 
of it (and an analogous width AVmn.v in velocity space), (b) the interval be- 
tween the maximum and the highest merger level with a valid local maximum 
AT) > A7;„a, and if the number of pixels associated with the shaded region 
is larger than N,„i„ . These criteria restrict the analysis to the subset of local 
maxima that are most distinct. 

By default, these are set to be D„,ax = 3 and AVma, = 7 resolu- 
tion elements, AT,„ax = ^o-rms and A',,,,-,, = 4 pixels for indepen- 
dent pixels. The bottom panel of Figure|2]shows the resulting 
dendrogram for D,„ax = 1 and AV,„a^ = 3 resolution elements 
and AT,„a\ = 0. Figure [3] is a schematic diagram illustrating 
the definition of these parameters. Changing AT„,ax results in 
the largest changes in the dendrograms for typical radio line 
data since a larger fraction of the local maxima fail the check 
against the contrast than any other noise suppression criterion. 
The default values represent a compromise between sensitiv- 
ity to dendrogram structure and algorithm performance. 

Noise has an additional affect on the dendrogram, namely 
intensity fluctuations can alter the levels at which two isosur- 
faces merge. A positive fluctuation can join two surfaces at 
a higher level than the surfaces would join in the absence of 
noise. We have modeled the influence of the noise by com- 
paring the merge levels of surfaces in a model cube in the 
absence of noise to those with noise added. We find, in gen- 
eral, that the merge levels are uncertain on a scale of ~ 2o'„„s 
with some variation based on algorithm parameters and the 
precise model used. In addition, there is a bias towards merg- 
ing ~ l(J,-ms higher than the surfaces would merge in the ab- 
sence of noise. The structure of the tree can only be con- 
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sidered accurate for amplitude changes larger than ^ 2(Jrms, 
and for scales smaller than this, the branching order may be 
transposed. HS92 discuss this effect in some detail for 2D im- 
ages and resort to a coarse binning of their trees to measure 
tree statistics (number of branches per node, etc.). We do not 
present tree statistics in this paper (see HS92), and we make 
an effort to account for the influence of noise in our results. 

3.2. Constructing the Dendrogram 

Practically, the dendrogram of an A^-dimensional intensity 
image is constructed by first identifying the local maxima 
that will comprise the top level of the dendrogram hierarchy 
03. lb . Then, the data are contoured with a large number of 
levels. For each contour value beginning with the maximum 
level, the dendrogram algorithm checks whether each pair of 
previously distinct regions have merged together. If so, the 
contour level and which surfaces merged are recorded and the 
next contour level is considered. We enforce binary mergers: 
if three or more distinct objects merge into a single object be- 
tween one contour level and the next, we refine the separation 
between contour levels so each merger involves only two ob- 
jects. The dendrogram (tree diagram) is constructed by draw- 
ing vertical segments corresponding to contour levels where 
the topology of the surfaces are unchanged and connecting 
corresponding branches at the levels where isosurfaces merge. 

Both the identification of local maxima and the levels at 
which two surfaces can be considered to be merged are influ- 
enced by the choice of connectivity in the data set. Practically, 
astronomical data is pixellated into square (cubic) pixels. The 
connectivity of the data set is determined by the number of 
neighbors a given pixel is defined to have. In two dimensions, 
a pixel can have either four neighbors (those pixels that share 
edges of the pixel) or eight neighbors (those pixels that share 
corners or edges). In three dimensions, a cubic pixel can have 
either six neighbors (those pixels which share a face with the 
cubic pixel) or 26 neighbors (those pixels which share a face, 
edge or corner with a given pixel). An alternative definition in 
the three dimensional case considers a cubic pixel to have 18 
neighbors corresponding to those pixels which share a face or 
an edge, but we emphasize 6- and 26-connectivity in the three 
dim ensional case to be an alogous to the two-dimensional case 
(see lWilliams et all 1994 for further discussion). Two points 
are in the same region if a path can be drawn from one point to 
the other through connected pixels which are all in the same 
region. For our analysis, we choose the minimum connectiv- 
ity (4 neighbors in the 2D case, 6 in the 3D), but we have ex- 
perimented with the maximum connectivity. Practically, the 
dendrogram changes by a small degree with corresponding 
mergers, on average, occurring at higher contour levels since 
it is "easier" for two regions to connect. 

4. MEASURING CLOUD PROPERTIES IN DENDROGRAMS 

Having developed a formalism where each point in the den- 
drogram corresponds to a unique isosurface in the data, we 
calculate the physical properties of the emission bounded by 
that isosurface. We can then use those physical properties to 
identify the relevant features in the data cube. Along branches 
of the dendrogram, the properties tend to be continuous func- 
tions of the contour level, while where two branches merge, 
the properties will change suddenly as a result of the merged 
object containing more emission. However, owing to the dif- 
ficulty in relating volumes in observed data space to volumes 
in physical space, the measurement of properties from regions 
of emission within the data cube is difficult to interpret (see 



^4.1l i. In this section, we describe our methods for estimat- 
ing the size, line width, luminosity and mass contained within 
each isosurface describe in the dendrogram as well as the 
complications that arise in doing so. 

We calculate the macroscopic properties of the regions of 
emission based on the moments of the volume weighted by 
the intensity of emission coming from every pixel follow- 
ing 'Rosolowsk^^Lero^ (I2OO61) . The data cube consists of 
a number of pixels that have sizes of 5x, 5y, and 6v in the two 
spatial dimensions and the velocity dimension, respectively. 
The ith pixel in the data cube has positions x, and y,, velocity 
V,, and brightness temperature 7^. We assume that the region 
under consideration is contiguous and bordered by an isosur- 
face in brightness temperature of value Tgdge, so that all of 
the pixels in the region have T > T^dge and the pixels outside 
the region have T < T^dge or are separated from the region by 
emission with T < T^dge- 

We begin by rotating the spatial axes so that the x and y axes 
align with the major and minor axis of the region, respectively. 
We determine the orientation of the major axis using princi- 
pal component analysis. The size of the region is computed 
as the geometric mean of the second spatial moments along 
the major and minor axis. This is cr, , the root-mean-squared 
(RMS) spatial size: 



^ riTgdge) ~ \/ ^maj^T^dge) ^miniTgdge) (1) 

where <T„,aj{Ted^e) and (T,„i„{Tgdge) are the RMS sizes derived 
from the intensity-weighted second moments along the two 
spatial dimensions. 



^ maj 



^.W;(.X,-(x))' 



(2) 



where we have assumed the major axis lies along the x coor- 
dinate and the sum runs over all pixels within the isosurface 
{T > Ti,d^e)- The weights in the moment are usually set to 
the brightness temperature of each pixel: w, = Ti. This par- 
ticular functional form for the cloud size is used since it has 
been used in previous observatio nal studies ( Solomon et al] 
Il987h and explored in depth by iBertoldi & McKeei (Il992h 
with respect to inclination, aspect ratio, and virialization. 
We define a factor rj that relates the one-dimensional RMS 
size, Gr, to the radius of a spheri cal cloud R: R = rjar- We 
take r]= 1.91 for cons i stency with ISolomon et al.l ( Il987h and 
iRosolowskv & Lerovl (1200 6'): the value of 1.91 merely re- 
flects the correction of the moment to the radius for the typ- 
ical concentration of emission found in molecular clouds. 
The velocity dispersion (dy) is calculated as the second mo- 
ment of the velocity axis weighted by the data values, anal- 
ogous to the size measurement. The flux of the region is 
the sum (zeroth moment) of all the emission in the region: 
F = J2i Ti S9jc S6y 6v. To convert the flux to a luminosity, we 
must assume a distance to the region. For a cloud at a distance 
of d (in parsecs), the physical radius will be: Rpc = Rmdd and 
the luminosity will be L = Fd^ where the flux is measured in 
units of K km s"' sr. For CO data, we calculate the mass of 
the region, we scale by a linear CO-to-H2 conversion factor 
(for intensities on the main beam temperature scale): 



M, 



Lum 



^CO 



x4.4- 



Leo 



Mq 2 X 102"[cm-V(Kkm s"')] Kkm s"' pc^ 



E 4.4^2 Leo, 
(3) 



where Xco is the assumed CO-to-H2 conversion factor This 
calculation includes a factor of 1.36 (by mass) to account for 
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the presence of helium. Including helium is necessary to fa- 
cilitate comparison with the virial mass, which should reflect 
all of the gravitating mass in the cloud. We have adopted 
a fiducial value of the CO-to-H2 conversion factor of Xqo = 
2 X lO^o cm-\K km s"') "' based on '^CO(l -> 0) observa- 
tions in the Milky Way (IStrong & Mattoxl[l996l: iDame et all 
I2OOI and express changes relative to this value in terms of 
the parameter X2. 

For each property, we can estimate the uncertainty in the 
property caused by the noise in the data set. Assuming the 
coordinate axes are well-defined, the uncertainties are alge- 
braically propagated through the formulas for the physical 
properties. 

4.1. Physical Interpretation 

The major difficulty in using the above calculated proper- 
ties directly is that it is difficult to ascribe meaning to a re- 
gion of emission defined by an isosurface. There is substan- 
tial concern that the naive association of a closed object in 
PPV space with an object in physical space, particularly as 
define d by contours of inte nsity in a data cube, may be inac- 
curate dOstriker et al.ll20()l1) . From the observer's perspective 
there seem to be three possible interpretations for the emis- 
sion in the data cube. Each of these interpretations leads to 
different sets of cloud properties and yields different results 
when applied to the same data set. These interpretations all 
revolve around determining what the appropriate values of the 
antenna temperature weights used in moments of the emission 
(w,) should be (e.g., those in Equation |2]i. We graphically 
summarize the three "paradigms" for measuring the proper- 
ties of isosurfaces in Figure]?] 

4.1.1. The Bijection Paradigm 

The calculations of properties sets the weights to the native 
values of brightness temperature drawn from the data cube 
Wi = T,. This assumption essentially maps PPV space to phys- 
ical space (i.e. three spatial dimensions) in a 1-to-l fashion 
(one pixel in the data cube corresponds to single volume in the 
cloud). To the extent that this is true, this is the correct thing 
to do. This result is the closest parallel to the CLUMPFIND 
algorithm which associates clumps of emission with clumps 
of density in physical space. Under the assumption of uni- 
form excitation conditions, an isosurface of brightness corre- 
sponds to a surface of constant opacity and hence of constant 
column density. In the physical regime where higher column 
densities are associated with higher physical densities, the bi- 
jection paradigm may be ideally suited for measuring cloud 
properties. 

A bijection may be inappropriate because of two effects: 
the first is the superposition of multiple, distinct objects along 
the line of sight that have the same velocity. The second is 
that a given volume likely contributes emission at multiple 
velocities due to an intrinsically broad line profile. Both of 
these effects cause the bijection to be flawed: the first means 
that a given pixel contains emission from multiple objects and 
the second means that any given volume appears in multiple 
pixels. We can attempt to correct for either of these effects, 
but not both. 

4.1.2. The Clipping Paradigm 

In this approach, the region is considered to represent a 
discrete object superimposed on a background of brightness 
Tedge- This approach assumes that any emission that can be as- 
sociated with other objects, by drawing a lower contour, is not 



associated with the object at all. In this case, the properties of 
the clouds should be calculated using weights w, = Ti — Tedge- 
The resulting values of some representative properties are 
shown in Figure |5] for the set of isosurfaces containing the 
maximum in the L1448 data cube. In general, the clipping 
tends to reduce all the properties, but affects the luminosity 
most significantly. This assumption is very conservative and 
the correct value would be derived using a weight value inter- 
mediate between and Tedge- 

4.1.3. The Extrapolation Paradigm 

In this paradigm, the properties of the region are extrapo- 
lated to the zero-intensity isosurface. The extrapolation cor- 
rects for the fact that some of the emission arising from the ob- 
ject is not contained within the contour drawn in PPV space. 
Instead of quoting the properties of the measured region, the 
extrapolation reports the properties implied for the entire re- 
gion as inferred from the part found above Tedge- An analogy 
for this correction is that we are predicting the underwater 
shape of an island volcano from the visible region above the 
water 

The extrapolation is carried out by considering the behav- 
ior of a property, say R, as a function of T^dge- For a given 
Tedge, we extrapolate RiJedge) to a value of T^dge = K based 
on the behavior of RiT^^gJ for all T^^^^ > Tedge- This method 
is described in more detail in iRosolowskv & Lerovl (l2006l) . 
in particular in their Figure 2. In short, the second moments 
are linear extrapolations for data above T^dge whereas the ze- 
roth moments are quadratic extrapolations. The behavior of 
the extrapolation can be traced on Figure |5] The value of 
the extrapolated radius is always larger than for the other two 
paradigms and the margin of difference is most substantial at 
large contour values since the range of the extrapolation is the 
largest. At small values of Tedge, the extrapolated value oscil- 
lates around 1 .5 pc, the final radius of the cloud. 

It is the latter quadratic extrapolation that produces the 
noise on the black curve in the left panel of Figure |5] While 
this method corrects for the emission associated with the ob- 
ject at low intensity values, it effectively adds emission to the 
objects so that the sum of the extrapolated objects from high 
intensity values may be larger than the amount of emission 
contained in th e data cube. Hence, ratios such as the virial pa- 
rameter 04.2I ) will be more accurate in this assumption than 
will integrated properties like radius or mass as a function of 
contour level. 

For the extrapolation paradigm, the dominant source of er- 
rors can be the actual data used in the fit and the errors can be 
assessed by bootstrapping the data used in the extrapolation 
(P ress et al...l992.) . See Rosolowsky & Leroy (2006.) for more 
details. 

4.2. The Virial Parameter 

We adopt th e viria l parameter as defined in 
iMcKee & Zweibell (11991) as a diagnostic of the ener- 
getic state of the regions in the dendrogram. In this case, the 
virial parameter a is defined as: 



(4) 



where Leo is measured in units of K km s"' pc^. For a <2, 
the object is self-gravitating in the absence of other forces. 
Magnetic fields, surface pressures and bulk motions will all 
affect the dynamical state of the cloud. Since such terms are 
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Fig. 4. — Graphical summary of the three paradigms for interpreting isosurfaces of emission investigated in this work. The figure shows the same one- 
dimensional emission profile and a contour level for each of the three cases. The shaded area shows the emission used to compute cloud properties (and is 
proportional to the luminosity). A bar is shown below the position axis indicating the relative extent or a moment-based size measurement under the three 
schemes. The standard interpretation of isosurfaces is the bijection scheme where elements in observational space correspond directly to objects in physical 
space. In the clipping paradigm, only emission above the contour level is associated with an object. In the extrapolation scheme, all the elements above the 
contour level are used to infer the behavior of the calculated properties in an extrapolation to the zero intensity isosurface. A similar set of characterizations 
would hold if velocity were the coordinate axis and the line width was measured. 




Fig. 5. — The behavior of calculated properties as a function of threshold level in the dendrogram for the set of isosurfaces containing the maximum of the 
L1448 data cube. The behavior of the cloud properties along the highlighted path are shown for the luminosity (left) and radius (right). Three curves are shown 
in each of these panels representing three possible ways of calculating the cloud properties at a given Tfjgg. The bijection paradigm is shown as a dashed curve 
(»'; = Ti). The dotted curve depicts the clipping treatment (w, = 7] — Tgjg,,), and the soUd curve shows the result of extrapolating the emission to the zero intensity 
isosurface. 



not readily measurable, we must adopt this simple estimate 
for the dynamical state with the understanding that it is only 
an approximation. The utility of the diagnostic is most likely 
in a relative sense rather than an absolute one. We should re- 
gard regions with a < 2 as regions where significant amounts 
of gravitational potential (mass) are found with comparably 
little kinetic energy so that gravity is likely important. In 
the remainder of the paper, we refer to such regions as "self- 
gravitating" though the description is subject to the caveats 
above. 

One concern that arises is the meaning of the virial param- 
eter under the three different approaches to calculating the re- 
gion properties that go into the virial parameter When adopt- 
ing the bijection approach, measuring the virial parameter for 
emission contained above a given contour could be inaccu- 
rate since the omission of the "wings" of the cloud would 
affect the size and line width measurements more than the 
luminosity measurement. Under the clipping approach, the 



assumption that none of the emission below a given contour 
level is associated with the object results in similar size and 
line widths while dramatically reducing the luminosity. As a 
result, this assumption likely overestimates the virial param- 
eter for the region. Finally, the virial parameter measured in 
the extrapolated case characterizes the dynamical state of the 
region implied by the emission found above a given contour 
level. The extrapolation method is most useful for character- 
izing objects for which the zero intensity isosurface is a mean- 
ingful boundary (i.e. discrete clouds rather than substructure 
within clouds). Given these considerations and our empha- 
sis on using virial parameter to estimate the dynamical state 
of structures in the data we adopt the bijection scheme for 
characterizing substructure, basically interpreting the isosur- 
faces in the data as corresponding to nested regions of succes- 
sively higher (column) density. When identifying clouds or 
other objects for which the zero intensity isosurface is a more 
meaningful boundary, we adopt the extrapolation paradigm. 
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We discuss the influence of these choices further in 
the application to observed data. 



5. THE HIERARCHICAL SUBSTRUCTURE OF A MOLECULAR 
CLOUD 

In this section, we apply the dendrogram method to two 
data sets and demonstrate useful statistics for the characteri- 
zation of the trees and the tree-based properties. 

5.1. LI 448 COMPLETE Data 

Our primary observational data set for demonstrating the 
dendrogram method is a section of the Coordinated Molecular 
Probe Line Extinction Therma l Emission'' survey's '^^CO map 
of Perseus dRidge et alj|2006l) . centered on the L1448 star- 
forming region. The data cube spans a square region 40' on a 
side which projects to a region 3. 1 pc x 3. 1 pc at the distance 
of Perseus (260 pc, Cernis 1993). The data have an angu- 
lar resolution of 46" and are sampled with 23" pixels. Since 
the observational methods produce non-uniform noise across 
the map, we add appropriately correlated noise to the origi- 
nal data to produce a map with spatially-uniform noise rms of 
(ccnij = 0.3 K on the main beam scale). The data cube has a ve- 
locity resolution of 0.066 km s"' , sampled every 0.066 km s~' , 
and spans 40 km s~', but the emission from L1448 only spans 
a 10 km s"' section of the data. An integrated intensity map 
of the cloud is shown in Figure |6] and channel maps are pre- 
sented in Figure |7] The channel maps highlight the presence 
of a low velocity feature not otherwise discernible in the in- 
tegrated intensity maps (vlsr 0.5 km s~') . The main and 
low velocity features are contained within a single connected 
isosurface for contour levels < 1.5 K. Individual clumps cor- 
responding to the branches of the dendrograms (see below) 
can be seen as well as the rough positions of the local maxima 
used in our analysis of the region 05.4b . 

5.2. Turbulent Simulation 

For comparison to the data, we also analyze simulation data 
from Padoan et al. (2006). The data are taken from their sim- 
ulated '■'CO emission maps generated from a 6 pc simulation 
box with a mean density of n = 1 0-^ cm"-'. The simulation is 
conducted using the Enzo code dNorman & Brvanlll999l) to 
simulate a 1024^ box using MHD with an initially uniform 
density and periodic boundary conditions. The mean Mach 
number in the simulation is = 6; the simulation is isother- 
mal and turbulence is driven in Fourier s pace at large s cales. 

For comparison with observed data, jPadoan et al.l (l2006h 
generated a simulated '^CO data set using a Monte Carlo ra- 
diative transfer code using the density and velocity distribu- 
tions of the simulated material in a snapshot (i.e. radiation is 
not included in the time propagation of the simulation). We 
extracted a trial data set matching the spatial extent of the 
observations from the full simulation box. Our selection in- 
cluded the section of the box that contained the most compact 
identifiable feature of emission. The trial data were convolved 
to the resolution of the FCRAO maps from COMPLETE and 
resampled in position and velocity to match the pixel size of 
the observational data. Spatially correlated noise, mimicking 
the noise in the FCRAO map, was added to the simulation data 
to produce the same underlying noise rms in both data cubes. 
Both the simulation and the observed data cubes will affected 
in the same manner by edges, resolution and noise. The only 

COMPLETE; |http : / /www . cf a ■ harvard ■ edu/COMPLETE/| 



differences between the two data sets should be found in the 
detailed structure of the emission. The simulation was com- 
pared favorably t o the f ull COMPLETE observational data set 
in Pad oan et al.l ( 120061) based on similarities in the turbulent 
power spectrum. However, the authors of that study empha- 
size that the simulation is not intended to simulate specific 
conditions within a molecular cloud. We focused our com- 
parative analysis with this simulation to illustrate the utility 
of dendrograms even though the simulation box may not be 
an excellent simulacrum of the LI 448 region in particular 

5.3. The ^^CO-to-Hi Conversion Factor 

We determine the scaling between '^CO luminosity and 
molecular cloud mass by comparing the integrated intensity 
of the '^CO emission to the extinction implied by the red- 
dening of background stars. We use the extinction map for 
the L1448 region derived from deep JHK Calar Alto observa- 
tions of the LI 448 region using the Near Infrared Color Ex- 
cess Revisited ( NICER Lombardi & Alves 2001) technique 
(iFoster & Goodm an 2006). The '^CO integrated intensity 
map is convolved and regridded to match the resolution (48") 
and astrometry of the extinction map. The extinction map 
saturates above Ay ^ 22 mag, and we ignore the 13 pixels 
with missing data in the analysis. Figure[8] shows the implied 
column d ensity (assurning A^ (H2) /Ay = 9.4 x 10^" cm and 
Rv = 3.1. lBohlin et alj [19781) as a function of the integrated 
intensity. We calculate a '^CO-to-Ha conversion factor of 



Xi3co = 8.0x 10^", 



K km s 



(5) 



based on the mean of A^(H2)/VK('"'CO) weighted by the in- 
verse variance of the column density estimates. As seen in 
Figure[8j the single conversion factor is nowhere an excellent 
approximation of the data, but represents an adequate map- 
ping between CO emission and column density over the en- 
tire range. Since the conversion factor will be ultimately ap- 
plied to individual channels, only a simple ratio is appropri- 
ate for the relationship; including complications such as non- 
linearities or a constant offset would bring up ambiguities in 
translating from '^CO emission to mass for individual chan- 
nels in the data cube. The simple ratio systematically underes- 
timates the column density for high brightness regions where 
the '^CO line saturates. As such, estimates for the virial pa- 
rameter are likely overestimates in these regions. In terms of 
the mass calculations presented in ^ X2 = 4.0. This is com- 
parable to the results of more sophisticated analyses of the 
conversion factor {Xi = 2.1; |Pineda et al. 2008) though their 
complex analysis is only applicable for total line intensity. 

5.4. The Dynamical State ofL1448 

We generate a dendrogram of the L1448 region with the 
'^CO data using the methods discussed in ^ We identify lo- 
cal maxima over a box that is 2 beam widths on a side and 5 
channels deep in the data cube. From this set, we eliminate 
redundant local maxima that are less than 1 .2 K (4(t„„j) above 
the level at which the surfaces containing that local maximum 
merge with other structures. We remind the reader that this 
decimation preserves the overall structure of the dendrogram 
and is considering a representative subset of the topologically 
important surfaces in the analysis (Figure |2]l. We calculate 
the virial parameter for the region as discussed in ^4.21 us- 
ing the bijection scheme to calculate isosurface properties. In 
Figure |9] we plot the dendrogram of the region, color coding 
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Fig. 6. — Integrated intensity maps of "CO emission from L1448 (left) and the simulation of IPadoan et alj t2006l) (right). In both images, the gray scale nms 
linearly from to 18 K km s"' on the T„^\, scale and the contours run from 2-16 K km s"' in intervals of 2 K km s"' in the left-hand panel and from 4-10 K km s"' 
in the right-hand panel. The two maps come from data cubes with the same native resolution and noise levels. 




10 -10 
Aa (arcmin) 

Fig. 7. — Selected channel maps of the L1448 region. Contours follow the grayscale image with a contour interval of 1 K on the T^^j, scale beginning at 1 K. 
The LSR velocity of each map is indicated in the upper left-hand comer of each panel. The channel maps reveal the low velocity feature (i'lsr ~ 0.5 km s"' ) not 
otherwise discernible in the integrated intensity map. The 3D positions of the leaves of the dendrogram are indicated with the leaf number shown in Figure [5] 
shown in the closest channel map to their actual position. 
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Fig. 8. — Relationship between integrated intensity for CO and tlie 
column density implied by reddening in the near infrared. A line representing 
the mean ratio between the two quantities is shown. 

points on the dendrogram with the corresponding value of the 
virial parameter We have calculated the errors in the virial 
parameter and suppress reporting any values where the for- 
mal errors in the virial parameter are larger than 50%. Figure 
|9] shows that several leaves of the dendrogram show evidence 
for self-gravitation on small scales associated with individ- 
ual local maxima. Note that the left-hand branches (leaves 
1-17) of the dendrogram appear self-gravitating, but for con- 
tour levels < 1 .5 K where the left and right branches merge, 
the ensemble properties of the object revert to being unbound. 
This change in dynamical state shows the main complex of 
L1448 is dynamically distinct from the low velocity feature at 
vlsr ~ 0.5 km s~' indicated by the branch on the right (leaves 
18-22). 




GAUSSCLUMP fail to probe and dendrograms provide novel 
insight. 

The multiscale analysis of the virial parameter allows us 
to define objects that are potentially physically relevant to the 
star formation process. We identify objects based on the crite- 
rion that self-gravity makes a significant contribution to their 
internal energetics. If we define a threshold for significant 
self-gravity, namely a < 2, we find "interesting" objects on a 
variety of scales. Applying this criterion to the virial parame- 
ter data shown in Figure|9]results in the dendrogram shown in 
Figure [TT] (left panel) where branches with a < 2 are shaded. 
Nearly all of the left-hand branch of the dendrogram corre- 
sponds to a self-gravitating object indicating the importance 
of self-gravity over the entire L1448 region. There are also 
three distinct sub-branches inside LI 448 that also show self- 
gravitation. Figure [To] shows the locations and spatial extent 
of the four leaves that show evidence of self-gravitation in the 
data cube (2,3,5 & 18). The central, star-forming section of 
L1448 is contained in leaf 3. Also interesting are the several 
branches for which there are large regions with reliable mea- 
sures of the virial parameter which are not self-gravitating. 
Referring to Figure |9] these branches have a ^ 2. Because 
of the minimal influence of self-gravity on these structures, 
we contend that these branches correspond to transient or 
pressure-confined structures in the physical data. 
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Fig. 10. — Integrated intensity image of the L1448 region with the location 
and extent of the self-gravitating leaves in the dendrogram indicated. Leaf 3 
contains most of the star-formation currently occuning in the region. Leaf 18 
is the dynamically distinct feature at low vlsr(~ 0.5 km s"'). 



Fig. 9. — Dendrogram of the L1448 region with branches of the dendro- 
gram colored according to the virial parameter at each point. Virial param- 
eter data are suppressed where the en'ors are larger than 50%. Several of 
the leaves of the dendrogram show evidence for self-gravitation as do larger 
structures in data. Since physical properties are calculated for the isosurfaces 
corresponding to the vertical branches of the figure, the horizontal branches 
of the dendrogram have no data reported. 

In general, the tops of the dendrogram leaves do not ap- 
pear as self-gravitating objects in this analysis. However, it is 
precisely in these regions where the '^CO tracer saturates so 
that these isosurfaces correspond to relatively more mass per 
unit brightness than our simple conversion factor admits and 
hence will be more tightly bound than we measure. Owing to 
the difficulties in assessing the dynamical state of these small 
objects, we emphasize the larger scales for self-gravitation in 
our analysis. These large scales are where CLUMPFIND and 



5.5. The Dynamical State of the Turbulent Simulation 

We have repeated the dendrogram analysis for the turbulent 
simulation using the same algorithm parameters to establish 
local maxima and contour the data. We adopt a '^CO-to-H2 
conversion factor of X2 = 10.9 based on analysis of the simu- 
lated '^CO data with respect to the simulate d co lumn density, 
using the same analysis as was used in in ^5.31 The dendro- 
gram presented in Figure [121 which can be compared to the 
observed data in Figure [9] The simulated and observed data 
cubes have similar numbers of leaves (local maxima) in their 
respective data volumes (39 in the simulation vs. 26 in the 
observations). The span of antenna temperatures are simi- 
lar, though most of the mergers in the simulated data cubes 
occur at higher levels than in the observed data. The princi- 
pal difference between the two dendrograms is that far more 
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Fig. 11. — Self-gravitating objects in the L1448 dendrograms based on the three different property calculation paradigms presented in ^4.11 For each 
dendrogram, regions with o < 2 are shaded in red and regions where the data quality prohibit calculation of the properties are highlighted in gray. The extent of 
the self-gravitating regime in parameter space depends on the paradigm adapted though many qualitative features are shared between the bijection and clipping 
paradigms. 



Fig. 12. — Dendrogram of the simulation data cube colored according to 
the virial parameter at each point. Virial parameter data are suppressed where 
the errors are larger than 50%. Nearly all of the structure in the simulated 
data cube corresponds to self-gravitating objects with a few leaves of the 
dendrogram representing unbound objects. 

of the simulated data cube corresponds to self-gravitating ob- 
jects than do the actual observations. Regardless of the ap- 
plicability of the dendrogram interpretations, the analysis il- 
lustrates a stark difference in the data cubes. The difference 
in dynamical states arises from amount of mass in the two 
data cube. Scaling the total emission in each data cube by 
the respective conversion factors shows there is ~ 4 times as 
much molecular mass in the simulation cube as there is in the 
L1448 region, but this extra mass is spread over a similar line 
width and spatial extent. As a result, self-gravity would play a 
stronger role in the simulated data cube. The simulation does 
not include the effects of gravity although our basic analysis 
suggests that self-gravity would have a significant influence 
on the simulated region. 

5.6. Interpretation of Dendrogram Properties 

The previous section discusses the physical meaning of 
the dendrograms under the assumption that the "bijection" 
paradigm holds relating objects in observational and physical 
spaces. Previously ( ^4. lb . we presented two other possibili- 
ties for relating the observed and physical domains, namely 
the clipping and extrapolation paradigms. We repeat the cal- 
culation of the virial parameter in L1448 for these two possi- 
bilities and present the results alongside the bijection results 
in Figure [TT] The extremely conservative clipping paradigm 
finds no self-gravitating structure in the entirety of the LI 448 
cloud. Given that simple calculations suggest that the region 



0.10 




Fig. 13. — Fraction of emission contained within isosurfaces correspond- 
ing to self-gravitating objects as a function of size scale for the L1448 
"CO(l 0) data and the simulated "CO observations. At small scales, 
very few objects are self-gravitating, but this fraction grows for larger size 
scales. The simulations have roughly constant fractions of self-gravitation 
across size scales. 



has a virial parameter of a ~ 2 and the presence of star form- 
ing clumps at the smallest scales, we conclude that the clip- 
ping paradigm is overly conservative and the small structures 
have more mass than are accorded to them. The extrapola- 
tion paradigm finds more self-gravitating structure in the map 
than the bijection, which is expected since the extrapolation 
corrects the luminosity by a larger factor than the radius and 
the line width (see Figure|5]l. However, it is interesting to note 
that the same qualitative behavior is present in the extrapo- 
lated results as are seen in the bijection analysis. In particular, 
the analysis finds two dynamically distinct regions in L1448 
corresponding to the left- and right-hand branches of the den- 
drogram. However, the extrapolation results assume that ev- 
ery object should have a brightness profile that runs continu- 
ously from the peak value to the zero brightness isosurface, 
and it may not be applicable in this case. For simplicity, we 
utilize the bijection scheme for calculating the properties of 
object substructure although extrapolation may be appropri- 
ate in cases where there should be no background emission 
(see 
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5.7. The Scale of Self-Gravity 

Many previous authors have identified self-gravitating sub- 
structure in their analysis of molecular emission. What, then, 
makes the dendrogram an alysis nov el? Using by-eye identifi- 
cation dBlitz & Stark|[T986 ; Bertold i & McKeelll992h or auto- 
mated algorithms such as CLUMPFIND or GAUSSCLUMP 
invariably finds that the most massive objects on the small- 
est physical scales sampled by the observatio ns are closest to 
being self-gravitating (Figure 3 in iBertoldi & McKee 1992I) . 
However, segmentation tends to identify structures on small 
scales (a few resolution elements), ignoring the objects at 
larger scales which comprise the superstructure of the molec- 
ular cloud. Dendrogram analysis avoids segmentation and 
naturally includes these larger scales since lower valued iso- 
surfaces encompass more emission with larger spatial extents. 

The multi-scale nature of the analysis naturally admits a 
study such as that shown in the Figure [13] which displays the 
fraction of emission on given scales in L1448 and in the tur- 
bulent simulation that has a < 2. We construct the diagram by 
measuring the virial parameter as a function of size scale for 
all the isosurfaces in the data cube. This fraction is defined as 
the sum over all isosurfaces 



f(R) 



Y.,{Li\Rie[R,R + ^Rlai<2} 
Y.,{Li\Ri(^[R,R+^R]} 



(6) 



where L,-, Rj and a, are the luminosity, radius, and virial pa- 
rameter of the /th isosurface. We bin the virial parameter 
data into bins of A/? = 0.2 dex in size scale and calculate 
the fraction of luminosity in each bin contained within iso- 
surfaces corresponding to self-gravitating objects. This cal- 
culation illustrates that only a small fraction of the structure 
at small scales in L1448 corresponds to self-gravitating ob- 
jects and that fraction grows at larger scales. The saturation 
of the '^CO line in small, bright regions makes this measure- 
ment a lower limit since more of the leaves may correspond 
to self-gravitating objects than are recovered in this analysis. 
In contrast, self-gravity is important for nearly all structure 
at all scales in the simulated observations. Several factors 
may contribute to this discrepancy. Incomplete physics in the 
simulation (lack of self-gravity, too high of density) or the 
incomplete synthesis of spectral line maps (no depletion as- 
sumed) may give discrepant results. Alternatively, the com- 
parison may be flawed and the section of the simulation box 
used may be not be appropriate for comparison to LI 448. 

5.8. The Size-Line Width Relationship in L1448 

In addition to identifying sets of isosurfaces that corre- 
spond to self-gravitating objects, the dendrogram technique 
also provides another way to probe the size-line width rela- 
tionship on intermediate scales inside the molecular clouds. 
The Principal Co r npone nt A nalysis methods devel oped in 
iHever & Schloerbl (Il997h and iHever & Brunj ( |2004|) are de- 
veloped to measure the structure function of turbulence within 
molecular gas and provide an excellent descriptor of the tur- 
bulence. The dendrogram application provides a similar mea- 
surement by measuring the spatial and velocity extent of the 
isosurfaces within an emission data cube. Indeed, the virial 
analysis above can be thought of comparing the spatial and 
velocity extents of the isosurfaces to the amount of emission 
they contain. We can construct the size-line width relationship 
within a data cube by plotting the size vs. the line width of 
the isosurfaces in the data. This becomes, in effect, a "Type 
4" size-line width relationships discussed in iGoodman et alj 
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Fig. 14. — The size-line width relationship for the isosurfaces of '^CO 
emission in L1448. The thermal line width (for T = 10 K) and the limits of 
the instrumental resolution are also indicated in the plot and data below these 
limits are unreliable. Individual points correspond to isosurfaces used in our 
contouring. Gray circles indicate the characteristic size and Une width for 
each independent branch in the dendrogram and represent a useful minimum 
sampling of the data. The gray line indicates the fit to the gray, circled data: 
(7,. = 0.6^?"^** which is comparable to the size-line width relationship found 
among turbulent molecular clouds. 



(Il998h : that is to say, using a single tracer species to measure 
the relationship in a single cloud. 

In Figure [141 we plot the size-line width relationship for 
the '^^CO emission from L1448. A fit to the grey, circled data 



gives (TNT = (0.62 ± 0.04)/? 



.0.58±0.04 



km s ^ typical of t he tur- 



bulent gas in molecular clouds (iGoodman et alJ[T998h . The 
scaling relations for turbulent gas traces the data at large 
scales/line width well. The size has been corrected for beam 
convolution effects by subtracting the beam width in quadra- 
ture. Similarly, the line width has had the thermal contribu- 
tion of 10 K gas removed. These corrections are rough since a 
simple Gaussian deconvolution is inappropriate for the small 
isosurfaces defined by high brightness contours and the broad- 
ening effect of the spectrometer has been neglected. Errors 
arising from our approximate treatment will become relatively 
small for sizes and line widths that are significantly larger than 
the instrumental resolution. At large line widths, linear sets 
of points correspond to branches in the dendrogram. Along 
branches, the properties change slowly (see, for example, the 
radius and luminosity in Figure|5]). The discontinuities occur 
when two isosurfaces merge together resulting in an abrupt 
change in the size and the line width of the isosurface. At 
small scales, the data dissolving into a sea of noise since the 
properties of these surfaces are poorly defined in the face of 
thermal noise and instrumental convolution effects. The scat- 
ter around the average relationship results, in part, from the 
details of the isosurface shapes which vary due to turbulent 
velocity/density fluctuations but also due to noise. 

Dendrograms can also be used to abstract the data set to a 
defining set of isosurfaces. The gray circles shown in Fig- 
ure [14] plot a single, characteristic size and line width for 
each branch of the dendrogram shown in Figure [9] Since 
the dendrogram contains 26 leaves, there are 5 1 independent 
branches {2N- 1) because each branch is required to join with 
others in a binary merger. Hence, only one point is plotted for 
every significantly distinct set of isosurfaces in the data and 
multiple points from redundant isosurfaces are suppressed. 
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6. IDENTIFYING GIANT MOLECULAR CLOUDS 

An additional application of the dendrogram technique is 
to identify Giant Molecular Clouds (GMCs) in a blended data 
set. Massive, isolat ed molecular clouds show virial nararn - 
eters close to unity (iSolomon et alJI 19871: iHever et aUboOlh . 
We propose that GMCs can be identified as the largest-scale 
self-gravitating structures in the ISM and such structures can 
be identified in the dendrogram analysis. Unlike the substruc- 
ture analysis presented previously, we are only interested in 
the dynamical state of the largest scale emission and the con- 
tamination by background emission is likely minimal. Hence, 
the property calculations for the dendrogram can use the ex- 
trapolation paradigm since we are interested in the properties 
at the K km s"' isosurface. 

To demonstrate this application of the dendrogram 
technique, we use the Orion-Monoceros '^CO data of 
IWilson et all (l2005h taken with the CfA 1.2-m telescope. All 
of the Orion-Monoceros complex is contained within a sin- 
gle isosurface with T,„[, = 0.4 K which must be decomposed 
into the constituent GMCs. We adopt the standard CO-to-Ha 
conversion factor (X2 = 1) and use the extrapolation paradigm 
to calculate the virial parameter for each branch of the den- 
drogram. We adopt a distance to the main Orion complex of 
450 pc and use a distance of 800 pc for Monoceros, 425 pc 
for NGC 2149, and 400 pc for the Northern Fi lament based 
on the identifications and distance estimates of IWilson et alj 
(|2005). 

Figure [15] shows the dendrogram for the region with sets of 
isosurfaces that have a < 2 highlighted. We identify GMCs 
in an automatic fashion as all emission contained within dis- 
tinct, self-gravitating regions with masses M > 5 x lO'* Mq. 
The three GMCs in the data cube naturally segregate from the 
rest of the emission. In Figure [T6l we show the emission con- 
tained in the T„,h = 0.4 K contour and its characteristic des- 
ignations. The dendrogram analysis identifies three regions 
as GMCs and finds that the remaining emission is not suffi- 
ciently massive or self-gravitating to be identified as a GMC. 
Given the good agreement with the standard identifications, 
we conclude that the dendrogram method can be used to iden- 
tify GMCs in blended sets of emission. The primary restric- 
tion is good knowledge of the distances to different regions 
of the data volume. This limitation implies that the method 
is best applied in the outer galaxy or in extragalactic analysis 
where distances are well-determined. 

7. SUMMARY 

We have presented a new application of tree diagrams to 
three-dimensional data sets. This application is closely relate d 
to the structure tree analysis of iHoulahan & Scald (|1992|) . 
These techniques use dendrograms to represent the merg- 
ing/bifurcating of contours in a three-dimensional data set as 
a function of contour level. Each point in the dendrogram cor- 
responds to an isosurface in the data cube. By characterizing 
molecular emission associated with these isosurfaces, we are 
able to measure the properties of both small- and large-scale 
structures in a data set. In particular, we emphasize determi- 
nations of the virial parameter and the size-line width rela- 
tionship at multiple scales in the data. The virial parameter, in 
particular, provides a means to estimate the influence of self- 
gravity on a variety of scales in the molecular cloud yielding, 
for the first time, a uniform study of energetics on a range of 
scales. 

The dendrogram technique is philosophically different from 
segmentation algorithms such as CLUMPFIND. By preserv- 



ing and characterizing the hierarchy of emission isosurfaces 
in a data cube, it is possible to study structures over a range 
of scales. In principle, the results of the hierarchical decom- 
position is independent of algorithm parameters, though the 
actual output is governed by the degree of simplification de- 
sired by the user Although the dendrogram analysis does not 
segment the data by itself, the results can be used to provide 
a physically-motivated segmentation of objects in some sys- 
tems. 

Despite the power of the dendrogram technique there are 
worrisome ambiguities at relating the observed to the phys- 
ical domain. We have presented three attempts to account 
for ambiguities in this relationship, but find no satisfactory, 
universally-applicable method. We note that many of our re- 
sults are subject to caveats regarding interpretations, but we 
argue that these caveats do not undermine the applicability of 
the techniques. 

We have analyzed '"'COCl — > 0) emission from the L1448 
region as observe d by the COMPLETE survey of Perseus 
iRidge et al. 2006) using the dendrogram methods. We note 
that common structure analysis techniques have a fundamen- 
tal scale built into their analysis and tend to analyze the dy- 
namical state of objects on that scale. As such, the synthe- 
sis of many such analytic studies conducted on various scales 
leaves an impression that the dendrogram analysis actually 
demonstrates. We find self-gravitating structures on all scales 
in L1448, though not in all regions. In particular, the majority 
of emission in small-scale structures is not self-gravitating; 
but, at larger scales, much of the L1448 region is influenced 
by self gravity. 

We have also illustrated the capacity for the dendrogram 
technique to make differential measurements between data 
sets. The dendrogram of L1448 is compared to a dendro- 
gram of a theoretical simulation finding qualitative and quan- 
titative differences. Differences of this magnitude were not 
discernible through other statistical techniques such as deter- 
minations of the power spectra (Padoa n et alJl20()6h . Future 
work will investigate further applications of the differential 
measurement techniques between dendrograms. 

The dendrogram technique can be used to measure the 
size-line width relationship within molecular clouds using the 
characteristic sizes and line widths of the constituent isosur- 
faces in the data. As expected, we recover the typical size- 
line width relationship for molecular clouds cr,, oc R^-^^ within 
a single cloud. 

Finally, we conclude the paper by presenting an alterna- 
tive application of dendrograms: the identification of Giant 
Molecular Clouds in blended line data sets. We define GMCs 
as massive (M > 5 x 10"*) clouds of gas that are (a) self- 
gravitating but (b) not bound to their surrounding medium. 
This definition not only identifies GMCs but does so exclusive 
from including low-mass chaff that is dynamically unrelated 
to the GMCs. Using this simple definition the dendrogram 
technique readily identifies the thr ee constituent GMC s in the 
blended Orion-Monoceros data of ' Wilson et al.l (l2005h . 

Beginning with common application of techniques devel- 
oped previously, this new perspective on dendrograms illus- 
trates their utility at the visualization and reduction of molec- 
ular line data. Dendrograms reduce three dimensional hierar- 
chical data sets to a two dimensional plot that retains essential 
features regarding the topology of the emission. This reduc- 
tion is conducted in a fashion that is minimally model depen- 
dent, relying on the intrinsic structure of the isosurfaces in an 
emission line data set. 
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Fig. 15. — The dendrogram of the Orion-Monoceros region. Branches of the dendrogram corresponding to self-gravitating structures are highlighted in red. 
Regions where the quality of the data prohibit accurate estimation of the virial parameter are shown in gray. The GMCs within the data cube are identified as 
the largest scale objects that are self-gravitating but not bound to each other Regions of the dendrogram corresponding to specific objects are labeled and the 
sections of the dendrogram corresponding to GMCs are shaded in yellow. 
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Fig. 16. — Map of emission for the Orion-Monoceros region contained 
within a T„,i, = 0.4 K contour The three constituent GMCs in the complex 
have been identified using the dendrogram analysis and their boundaries ai'e 
indicated in r ed. T he regions are labeled according to their designations in 
IWilsonetalJflOO^ . 
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